Reward maximization assessed using a sequential patch depletion task in a large sample of heterogeneous stock rats

Choice behavior requires animals to evaluate both short- and long-term advantages and disadvantages of all potential alternatives. Impulsive choice is traditionally measured in laboratory tasks by utilizing delay discounting (DD), a paradigm that offers a choice between a smaller immediate reward, or a larger more delayed reward. This study tested a large sample of Heterogeneous Stock (HS) male (n = 896) and female (n = 898) rats, part of a larger genetic study, to investigate whether measures of reward maximization overlapped with traditional models of delay discounting via the patch depletion model using a Sequential Patch Depletion procedure. In this task, rats were offered a concurrent choice between two water “patches” and could elect to “stay” in the current patch or “leave” for an alternative patch. Staying in the current patch resulted in decreasing subsequent reward magnitudes, whereas the choice to leave a patch was followed by a delay and a resetting to the maximum reward magnitude. Based on the delay in a given session, different visit durations were necessary to obtain the maximum number of rewards. Visit duration may be analogous to an indifference point in traditional DD tasks. Males and females did not significantly differ on traditional measures of DD (e.g. delay gradient; AUC). When examining measures of patch utilization, females made fewer patch changes at all delays and spent more time in the patch before leaving for the alternative patch compared to males. Consistent with this, there was some evidence that females deviated from reward maximization more than males. However, when controlling for body weight, females had a higher normalized rate of reinforcement than males. Measures of reward maximization were only weakly associated with traditional DD measures and may represent distinctive underlying processes. Taken together, females performance differed from males with regard to reward maximization that were not observed utilizing traditional measures of DD, suggesting that the patch depletion model was more sensitive to modest sex differences when compared to traditional DD measures in a large sample of HS rats.


Results
Delay discounting. Performance in the Sequential Patch Procedure was first analyzed using a traditional DD approach, which focused on the rejection volumes at the different change over delay (COD) travel times (Fig. 1a). Sex differences were insufficiently large to be reflected in the hyperbolic discounting function gradient (k; Fig. 1c) or AUC (Fig. 1d). The lack of difference at 0 s indicates that males and females do not differ on the bias parameter of the hyperbolic equation (b; Fig. 1b).
Patch utilization. Males and females displayed widespread differences in resource acquisition. Water consumption was evaluated by controlling for body weight [29][30][31] . A two-way between-subject ANOVA with sex as the between-subject factor and delay (0, 6, 12, 18, and 24 s) as the within-subject factor was conducted to determine if sex moderated water consumption (µL/min/kg). There was a significant interaction between delay and sex [F(4,7168) = 60.502, p < 0.000, η p 2 = 0.033] and significant main effects of delay [F(4,7168) = 5672.762, p < 0.000,  (Fig. 2a). This sex difference was consistent when the normalized AUC was determined for the water consumption data with females having significantly greater AUC compared to males [t(1792) = − 59.779, p < 0.001, d = − 2.823; Supplementary Fig. 1a].
These effects were driven by differences in several associated variables. A two-way between-subject ANOVA examining the number of patch changes/leave choices identified a significant delay x sex interaction [F(4,7168) = 3.805, p < 0.01] and significant main effects of delay [F(4,7168) = 5387.924, p < 0.001] and sex [F(1,1792) = 15.933, p < 0.001]. Effect sizes (η p 2 ) were as follows: delay = 0.75; sex = 0.09. Post-hoc pairwise comparison Bonferroni tests showed that patch changes were significantly different for each delay, indicating that the number of times the rats changed patches decreased as the delay increased. Independent samples t tests with Bonferroni corrections showed that females exhibited significantly fewer patch changes (leave choices) than males at all delays tested [0 s, t(1792) = 3.293, p < 0.001, d = 0.156; 6 s, t(1792) = 3.366, p < 0.001, d = 0.159; 12 s, t(1792) = 4.082, p < 0.001, d = 0.193; 18 s, t(1792) = 2.869, p = 0.004, d = 0.135; 24 s, t(1792) = 2.840, p = 0.005, d = 0.134] (Fig. 2b). This sex difference was consistent when the normalized AUC was determined for patch changes data with males having significantly greater AUC compared to females [t(1792) = 4.007, p < 0.001, d = 0.189; Supplementary Fig. 1b]. The difference in number of patch changes was accompanied by sex differences in the mean amount of time spent in patches; a two-way between-subject ANOVA revealed a significant interaction between delay and sex [F(4,7168) = 5.710, p < 0.001] and significant main effects of ) were as follows: delay = 0.783; delay x sex = 0.000; sex = 0.008. Post-hoc pairwise comparison Bonferroni tests showed that the amount of time in patch was significantly different for each delay, indicating that rats increased the amount of time spent in the patch as the delay increased. Independent samples t tests with Bonferroni corrections revealed that females spent significantly more time in the patch than males before leaving for a new patch at 6-and 12-s delays [6 s, t(1792) = − 4.312, p < 0.001, d = − 0.204; 12 s, t(1792) = − 5.375, p < 0.001, d = − 0.254] (Fig. 2c). This sex difference was consistent when the normalized AUC was fit to the data with females having significantly greater AUC compared to males [t(1792) = − 4.221, p < 0.001, d = − 0.199; Supplementary Fig. 1c]. Taken together, these data indicate sexually dimorphic performance on the characteristics of patch visits on the sequential patch depletion task, with females making fewer patch changes, remaining in a patch for longer periods of time resulting in higher normalized rate of reinforcement.
To assess whether these patch utilization patterns generally reflected a reward maximization strategy (according to the MVT), we assessed differences in deviations from the optimal time in patch; a two-way between-subject ANOVA identified a significant interaction between delay and sex [F(4,7168) = 4.83, p < 0.001] and significant main effects of delay [F(4,7168) = 3478.69, p < 0.001] and sex [F(1,7192) = 13.293, p < 0.001]. Effect sizes (η p 2 ) were as follows: delay = 0.660; delay x sex = 0.003; sex = 0.007. As can be seen in Fig. 3a, rats spent less time in the patch than predicted by our reward maximization calculation when the CODs were short, but this deviation declined systematically until the time in patch approximated optimal predictions (COD of 12 s) and then exceeded predictions toward underharvesting (leaving the patch sooner than optimal). Post-hoc independent samples t tests with Bonferroni corrections revealed that the deviation from the optimal time in patch was more The outlines of the violin plots illustrate the average kernel probability density (i.e., the width of the colored area represents the proportion of the data for that variable). The dashed lines indicate the median data, and dotted lines indicate quartiles. (b) This plot represents the bias parameter, b for one response alternative. No significant difference between sexes was observed. (c) This plot represents the discount parameter, k, a free parameter that indicates the rate of reinforcer devaluation as a result of the delayed delivery of the reward. The k values of the best fitting hyperbolic discount functions indicated that sex did not affect delay discounting (DD). (d) This plot represents the normalized area under the curve (AUC) of the indifference point as a function of delay (ratio). No significant difference between sexes was observed. *p < 0.05, **p < 0.01, ***p < 0.001 between males (n = 896) and females (n = 898).  (Fig. 3a).
To further evaluate reward maximization, the observed rejection volume was compared to the optimal rejection volume (predicted by the MVT); a two-way between-subject ANOVA identified a significant interaction between delay and sex [F(4,7168) = 10.109, p < 0.001] and a significant a main effect of delay [F(4,7168) = 1428.742, p < 0.001]. Effect sizes (η p 2 ) were as follows: delay = 0.444; delay x sex = 0.006. Females deviated from the optimal rejection volume significantly more than males at delays of 6 and 12 s [6 s, t(1792) = − 2.672, p < 0.001, d = − 0.126; 12 s, t(1792) = − 3.287, p = 0.005, d = − 0.155], whereas males deviated significantly more than females at 24 s [t(1792) = 2.032, p < 0.01, d = 0.096] (Fig. 3b). Relative to MVT predictions, both sexes were exhibiting overharvesting (leaving the patch after it would be considered optimal) at 0-and 6-s delays, but at longer delays (12-24 s), there was a shift toward optimal. This altered patch leaving as a function of COD is clearly illustrated   www.nature.com/scientificreports/ by changes in the frequency distributions of rejection volumes across delays; the percentages of animals overharvesting decreased as delay increased (Fig. 4). These data indicate that rats' behavioral choices deviate from the values predicted by the MVT.
Strengths of associations between patch utilization, reward maximization, and DD. Table 1 shows the correlations amongst traditional DD composite variables (k and AUC), patch utilization composite variables (patch changes, time in patch, and rate of water reinforcement), and reward maximization composite variables (time and volume deviations) in male and female rats. For both sexes, there were strong significant negative associations between k and AUC values (p values < 0.001; Table 1). DD composite variables significantly correlated with the composite scores for patch changes for males and females (p values < 0.05). However, while significant associations were observed between patch utilization composite variables (time in patch, patch changes) and DD composite variables (k and AUC; all p values < 0.05), these associations appeared weaker than the associations observed between patch utilization variables and reward maximization variables (time and volume deviation; all p values < 0.05). Weak to no associations were observed for both DD and reward maximization and rate of water reinforcement (r = 0.3 < 0.03, see Table 1).
To compare the strength of the associations between DD variables and reward maximization for each of the patch utilization variables, we applied the method described by Meng et al. 32 , which compares sets of nonindependent overlapping correlation coefficients via a z-test procedure. For most comparisons, there were differences in the strength of associations (see Table 2 for z-scores). Correlations between patch utilization composite scores and reward maximization were significantly stronger than correlations between patch utilization and DD variables in both males and females (Tables 1 and 2). For example, correlation coefficients between number of patch changes and k (males: r = -0.30; females: r = -0.35) were significantly weaker than correlation coefficients observed between number of patch changes and time deviation (males: r = 0.87; females: r = 0.88; see Table 2 for z-scores). Similarly, correlation coefficients between number of patch changes and AUC (males: r = 0.45; females: r = 0.47) were significantly weaker than correlation coefficients observed between number of patch changes and time deviation (males: r = 0.87; females: r = 0.88; see Table 2 for z-scores). This pattern in the strength of associations was repeated for time in patch (Patch utilization correlation coefficients: Reward maximization > DD, as shown in Table 1). A different pattern emerged for normalized rate of water reinforcement. Here, both DD and reward maximization variables were either weakly or not significantly correlated to normalized rate of water reinforcement. Time deviation correlation coefficients were similar to those observed for k, and AUC correlation  Positive values indicate staying longer than optimal, whereas negative values indicate leaving before optimal. Male heterogenous stock rats showed significantly greater reward optimization at delays of 0-6 s, whereas female rats showed significantly greater reward maximization at delays of 12-18 s. (b) Percent deviation from optimal rejection volume. Observed rejection volume (indifference point) was compared to the optimal rejection volume for reward maximization.
Positive values indicate a rejection volume less than optimal, and negative values indicate a rejection volume greater than optimal. Males showed significantly greater reward optimization at delays of 6 and 12 s, whereas females showed significantly greater reward maximization at delay of 24 s. Data are expressed as the average ± standard error; *p < 0.05, **p < 0.01, ***p < 0.001 between males (n = 896) and females (n = 898).   www.nature.com/scientificreports/ coefficients were similar to correlation coefficients for volume deviation. These results were confirmed using correlation comparison approaches in the R package 33,34 . Taken together, these data indicate that while standard DD variables are associated with patch utilization, these associations are weak relative to associations between patch utilization and reward maximization in both males and females.

Discussion
The Marginal Value Theorem (MVT) suggests that optimal foragers will choose to stay in a patch until the rate of reward falls below the average rate of reward in alternative patches 3 . Consequently, longer travel delays between patches make staying in a depleting patch a more optimal choice because the longer delay depresses the average rate of reward, resulting in foragers remaining in the patch for a longer period of time. Alternatively, shorter travel delays between patches makes staying in a depleting patch for a long period of time a less optimal choice, resulting in "leave" choices occurring sooner. This is supported by the current experiment, in which CODs (representing travel delays) altered choice behavior. Increasing delays resulted in a decrease in the number of patch changes and an increase in time in patch as rats made more "stay" choices. The sequential patch depletion task is a valuable paradigm to understand important processes underlying optimal behavior because it has strong ethological validity. However, another important strength of this paradigm is the ability to examine reward maximization in greater depth. By assessing the percent deviation from optimality, which normalizes the data to map reward maximization for various delays, we found that animals Table 1. Pearson correlation matrices for male and female rats. k free parameter discounting index, VD volume deviation, TIP time in patch, PC number of patch changes, WR water reinforcement (μl/min/kg). *Correlation is significant at the 0.05 level. **is significant at the 0.01 level (two-tailed analysis).  Table 2. z-scores for relative strengths of correlations between delay discounting and reward maximization metrics in male and female rats. TIP time in patch, PC number of patch changes, WR water reinforcement (μl/ min/kg), k free parameter impulsivity index, VD volume deviation, TD time deviation, AUC area under the curve for indifference points. Significant difference between correlation coefficients using the z-test procedure (*p < 0.05; **p < 0.01) outlined by Meng et al. (1992) and the R Cocur package (Diedenhofen & Musch, 2015). Variable in parentheses indicates statistically stronger correlation in comparison. www.nature.com/scientificreports/ tended to overharvest at short delays, as indicated by remaining in a patch longer than optimal. Conversely, rats underharvested when the delays were longer, as indicated by animals leaving the patch sooner than optimal. Notably, the greatest deviation from the optimal time in patch occurred when delays were < 6 s, that is 0 s. These data may imply that despite the absence of a travel cost, some other unidentified variable influences the rat's choice to stay in the current patch. While the focus of MVT is about travel costs, other factors can also contribute to choices to stay in a patch, such as predation and energy expenditure 1,2,4 . In the current study, effort needed to change patches may have been weighted more at shorter delays. Overharvesting observed at shorter delays may be due to the nature of the sequential patch procedure. In this task, the reward is presented and there is a 4 s delay prior to presentation of the subsequent reward. Part of this time is spent consuming the initial reward, so it may be the case that the rewards are presented close enough together in time that the delay is not noticed until the volume is substantially lessened from the original volume. It is unclear if these variables are contributing to the sex differences found in this study. Future research is needed to tease apart how these factors may contribute to overharvesting. Our data demonstrate that foraging behavior and traditional measures of impulsive choice, such as those provided by traditional DD indices, may share some characteristics of behavioral responding but may more strongly reflect distinct behavioral processes. We observed significant associations between DD and reward maximization variables. However, of note, reward maximization variables were more strongly associated with patch utilization variables than the DD variables. Consistent with this, Hayden, Pearson, & Platt 9 showed that the performance of monkeys in a foraging task fit better with the MVT model than with hyperbolic DD function; they obtained a similar result when testing a patch-leaving foraging task interleaved with a traditional DD task 35 . Sex differences in DD, patch utilization and reward maximization. Greater DD, and by extension impulsive choice, is indicated by longer stays in a depleting patch with smaller rewards. Although females had lower indifference points than males at some delays, no significant differences in hyperbolic k values or AUC were found. Relatively few animal studies have examined sex differences in DD, however these studies have produced conflicting results. In animals, larger DD has been found in female rats 36,37 and mice 38 , whereas others have identified larger DD in male rats 39,40 , and either age-dependent sex differences 41 , or no sex differences [42][43][44][45] have been described. Similar discrepancies have been reported in some human studies, with greater discounting observed in women 15,23,46,47 , whereas other studies have found greater discounting in men [48][49][50] . Even more studies of humans have found no differences [51][52][53][54][55][56][57][58] . These disparate findings call for a more comprehensive examination of these differences in impulsive choice, as a foundation on which to explore the interrelationships between gender and DD in humans as well as their roles in psychopathologies (e.g. substance abuse).
In the present study, while both sexes deviated from optimal performance, females were less successful in reward maximization at relatively short delay times (i.e., decision-making about time spent in patch and the amount of reward to reject), though when controlling for body weight, an inverse relationship is observed with females exhibiting a higher rate of reinforcement despite having made fewer patch changes and spending more time in patches than males. The effect was only observed as a function of body weight and this pattern suggests that the relationship between normalized reinforcement rate and reward maximization is complex. The current sequential patch depletion task was not designed in a way that the normalized reinforcement rate would be greater if the time spent in patch and the amount of reward rejected were closer to the optimizing strategy. Future studies should design a suitable task and adjust the procedure to further dissect the sex differences observed in the present study to reconcile these inconsistencies.
These data are related to findings of sex differences in humans in the Iowa Gambling Task, in which men had a greater preference for cards that were advantageous in the long-term compared to women 59 . From this, van den Bos 59 posited that males tend to focus more on long-term goals, shifting from exploration to exploitation, whereas females exhibited greater exploratory behavior 38,60 . The findings from the current study offer nuances to this hypothesis. When controlling for body weight, the female rats in our study had a greater rate of reinforcement than males, which does not support the interpretation that the males focused on long-term goals. Furthermore, females did not exhibit greater exploratory behavior as they made fewer patch changes and stayed in the patch significantly longer than males. Instead, the greater rate of switching between patches observed in males may reflect greater behavioral flexibility, whereas females may be demonstrating more perseverative behavior 61 . Reports of sex differences in perseverative behaviors have been inconsistent [62][63][64][65] , and thus, further research is needed to reconcile these discrepancies. This is of particular importance because perseveration is a feature of psychological disorders (e.g. schizophrenia, autism, OCD and drug addiction) 66 , and understanding these processes may have implications in sex-dependent vulnerability.
There are a variety of possible explanations for why females stayed longer in a patch. For example, there are sex differences in energy expenditure 67,68 , that may result in a female bias to preserve energy. Greater "overharvesting" observed in female rats at short delays may be an inaccurate interpretation: females may have exhibited appropriate levels of harvesting given environmental or biological factors, such as those related to reproductive success which made staying in the patch a more advantageous choice. Furthermore, their strategy may be to fully deplete the patch and exploit all resources before moving on to the next patch, as overharvesting may not abrogate visiting the patch in the future when resources have been replenished. In addition, females may have a differential sensitivity to tracking a changing environment, or to cues of change 69 . Indeed, Tropp & Markus 70 found males and females utilize cues differently in various environments. When animals are presented with diminishing rewards upon the choice to stay in a patch, not only are they being reinforced, but they are also gaining new information about the quality of the reward, an important component of the economics of choice behavior 1,2 . Finally, females may tend to choose safer options, which is supported by studies in rats [71][72][73][74] . www.nature.com/scientificreports/ Taken together, these data suggest that patch depletion model and by extension reward maximization was more sensitive to modest sex differences, compared to traditional DD tasks and metrics.
Future directions. While we report differences between males and females in reward maximization, it is important to note we did not track estrous cycles in the females, so we cannot determine if this affected their performance on this task. To date, there is little evidence regarding the hormonal role in mediating these behavioral processes. Here, we present statistically different, but admittedly modest sex differences in performance on the sequential patch depletion task. This may be, in part, due to the role of the estrous cycle in performance on these tasks. In keeping with the literature exploring sex differences on DD, there is limited and conflicting evidence for the role of the estrous cycle on DD 36,75,76 . Future research is needed to determine the impact of the estrous cycle on performance in the patch depletion procedure.
The sequential patch model assumes that foragers have perfect knowledge of the model's parameters, identified by Stephens 2 as the "complete information assumption. " While the rats in our study had substantial training, this assumption may influence how we interpret their behavior. The differences between rewards associated with optimal rejection volumes were small, which may have contributed to the deviations from optimality. This issue has been noted previously in analyses of optimal performance on progressive schedules with reset [77][78][79] .
This is the first report of sex differences in a large sample of Heterogenous Stock rats tested using the sequential patch depletion procedure. By using a large sample of Heterogenous Stock rats, we showed that females performance differed from males with regard to reward maximization that were not revealed utilizing traditional measures in DD. Notably, frequency distributions of rejection volume indicate that there was also sizeable variability in rejection volumes within each sex, which enables us to explore individual differences or genetic/environmental variables that contribute to performance in this task in future studies. Furthermore, measures of reward maximization were only weakly associated with DD variables and thus may be mediated by different underlying processes. Taken together, these data show the utility in the use of the sequential patch depletion procedure for measuring choice and may have implications in vulnerability in the development of psychiatric diseases.
Rats were housed in same-sex pairs in plastic cages (42 × 22 × 19 cm) lined with bedding (Aspen Shavings). Prior to the start of the experiment, rats (n = 1590) were first tested on four behavioral tasks (social reinforcement, locomotor response to novelty, light reinforcement, and choice reaction time; data are not reported here). A subset of animals (n = 204) were not tested on the social reinforcement test because it was introduced after the first two batches had already been tested. At the onset of data collection, the mean (± SEM) age of the rats was postnatal day 136.58 ± 0.29 and weight was monitored over the course of the experiment.
Behavioral testing was conducted 6 days/week (Monday through Saturday) during the dark phase of the light-dark cycle between the hours of 08:30 and 12:30. Food (Teklad Laboratory Diet #8604) was available ad libitum in the home cages. Access to water was restricted to 30 min immediately following testing on Monday through Friday. At the end of the testing on Saturday, animals were given free access to water until approximately 12:30 h on Sunday (approximately 20 h prior to testing on Monday).
This study was conducted in accordance with protocols approved by the Institutional Animal Care and Use Committee at the University at Buffalo, and animals were treated in compliance with the Guide for the Care and Use of Laboratory Animals, and the study is reported in accordance with the ARRIVE guidelines 81 .
Apparatus. Testing occurred in 24 locally constructed operant chambers (24 × 22 × 20 cm; Fig. 5) housed in sound-attenuating cabinets (Model # 3,000,000,187, Coleman, Wichita, KS), which were previously described in detail 82 . Briefly, the chambers had stainless-steel rod floors, aluminum back and side walls, and a Plexiglas front wall and top. Each test chamber had three snout poke receptacles (4 cm in diameter) located in the back and side walls. Infrared photobeam detectors, located 1 cm from the snout poke receptacle entrance, were used to record snout pokes. Three stimulus lights were located above each snout poke receptacle, and a fourth light was located in the ceiling of the test chamber. A Sonalert tone generator (SC628EJR; Mallory Sonalert Products, Indianapolis, IN) mounted on the right wall provided a pulsed 1.9-kHz tone. Acrylic dishes were located inside the left and rear snout poke receptacles and were connected to Tygon tubing, which delivered precise amounts of water from 60 ml syringes mounted on two single-speed syringe pumps (3.33 rpm, PHM-100; Med Associates, St. Albans, VT) external to the sound-attenuating cabinet.
The apparatus was controlled by MED-PC IV software (RRID:SCR_012156) with 1 ms temporal resolution running on computers with Microsoft Windows operating systems. Equipment was tested before test sessions and following any session in which a rat earned fewer than 30 reinforcers. Sequential patch depletion procedure. Behavior was measured using a sequential patch depletion procedure that was previously described (Fig. 6a) 26,83 . Briefly, during this task, water-restricted rats were offered a www.nature.com/scientificreports/ concurrent choice between two water "patches" (left-or back-wall snout poke receptacles) and could elect to "stay" in the current patch or "leave" for an alternative patch at any time. A snout poke in the receptacle (i.e., entering the patch) resulted in immediate delivery of 150 µl water. If rats made a choice to "stay" in this patch, successively smaller amounts of water were delivered. To simulate patch depletion, the volume of water for successive "stay" choices was reduced 20% for each reinforcer presentation. For example, the first reward volume was 150 µl of water, the second reward was 120 µl, and the third reward was 96 µl, etc. with each reward delivery separated by a minimum of 4 s (Fig. 6a,b). Once in a patch, water was available according to a modified Fixed Interval (FI) 4-s schedule; each water delivery was followed by a 4-s interval during which the reinforcer was unavailable. However, successive "stay" choices were achieved in one of two ways: rats could emit a snout poke to the same receptacle as the previous response following the 4-s interval (traditional FI contingency) or the rat could remain with its snout in the receptacle for the duration of the 4-s interval (analogous to a Fixed Time or FT schedule).
If a rat made the choice to "leave" a patch by poking its snout in the alternative receptacle ("alternative patch"), a changeover delay (COD) was imposed to simulate "travel" cost 84 . During the COD, reinforcers were not available at either location regardless of responding. When the rat left a patch to travel to the alternative patch, the abandoned patch was replenished (the volume of the water was reset to 150 µl for the first reinforced response on returning to that patch).
When a patch change occurred with a 0-s COD, the stimulus light above the abandoned patch was extinguished, and the stimulus light above the newly poked receptacle was illuminated simultaneously with the delivery of 150 µl of water. For CODs > 0 s (6, 12, 18,  www.nature.com/scientificreports/ tone was turned off and the stimulus light above the newly poked receptacle was illuminated. The first snout poke into the new location after the onset of the stimulus light resulted in the delivery of 150 µl of water. Sessions lasted for 10 min or until the rat earned a cumulative total of 5 ml of water, whichever occurred first. The COD was constant within a session but varied between sessions (i.e., days of the week) in the following sequence (Monday through Saturday): 0, 0, 6, 12, 18, and 24 s, with the first session of the week being excluded from data analysis (i.e., the first 0 s session). This 6-day cycle was repeated four times for a total of 24 test sessions. Data from the last two cycles of sessions for each COD delay were averaged for data analysis. Optimal Reward Optimal Time

Reward Optimization Prediction
Optimal Time in Patch (s) Figure 6. Schematic illustration of the sequential patch depletion procedure. (a) Rats are offered a sequential choice between two "patches" (snout poke receptacles). "Stay" choices (snout poke response to the same receptacle) resulted in presentation of decreasing volumes of water (r = reward; μl = microliters of water). "Leave" choices (snout poke response to the alternate receptacle) were followed by a changeover delay (travel cost) and presentation of the initial larger volume of water (150 μL). Following a "leave" choice, the abandoned patch is replenished to the original reward volume (150 μL). See text for full explanation. (b) Plot representing the volume of water across time in patch. The solid line represents the cumulative volume of water earned for successive stay choices in a patch. The dashed line represents the diminishing volume of water available for staying in the patch. (c) Plot representing the optimal rejection volume of water and time in patch across all delays tested according to the Marginal Value Theorem. The solid line represents optimal switching volume, whereas the dashed line represented optimal time spent in patch. www.nature.com/scientificreports/ Dependent measures. There were five primary dependent measures: number of patch changes, average time in a patch, average water volume rejected by leaving the patch ("rejection volume"/"indifference point"), water consumption (μl/min/kg), and the deviation from optimality. The first and final patches, including associated patch changes, of each session were not included in these calculations. These patches were excluded because, in the case of the first patch, rats had not yet experienced the COD, and the last patch resulted in session termination due to the session ending and so not accurately representing rats' choice behavior. Rate of reinforcement was controlled for by body weight and calculated by summing the volume of water earned and dividing that by time required to earn those reinforcers divided by body weight (kg). Number of patch changes was the number of times rats chose to switch to the alternative receptacle ("leave" choices) averaged over the last two sessions. Time in patch was the mean duration the rat stayed at one receptacle before leaving for the other. The rejection volume/indifference point was defined as the mean amount of water (µL) available at the abandoned patch when the rat switched to the alternative snout poke location, e.g. if the rat had earned 120 µl before leaving, the leaving volume would be the next volume scheduled for delivery, 96 µl in this example. The optimal rejection volume, based on the MVT, was operationally defined as the volume of water that maximized the average reward volume rate across all patches including the COD travel time. This was calculated as the cumulative rate of return from the patch (μL/s) for each reward, considering the COD length (travel time to the patch) and the 4 s between successive rewards (Fig. 6c) 14 μL/s). In this instance, an optimal strategy predicts leaving after two rewards are obtained and the rejection volume is 96 μL. The percent deviation from the optimal rejection volume (referred to as percent volume deviation) was calculated as follows: (optimal rejection volume − observed rejection volume)/optimal rejection volume × 100. Positive values indicate rats overharvested and stayed in the patch when the volume of the collected reinforcers had dropped below the value required to maximize, whereas negative values indicate the rat left when the volume was greater than the optimal volume. The same approach was used to calculate the percent deviation from optimal time in patch (referred to as percent time deviation) was calculated as follows: (observed time in patch − optimal time in patch)/optimal time in patch × 100. Positive values indicate the observed time in the patch was longer than optimal, and negative values indicate the observed time in the patch was less than optimal.

Scientific
Statistical analysis. Statistical analyses used SPSS Statistics software (IBM, Armonk, NY). Descriptive statistics indicated that the distributions of dependent measures were normal (skewness <|1|), so parametric statistics were used throughout.
To examine the effects of COD travel time on patch leaving, we used a mixed factor analysis of variance (ANOVA), with delay as the within-subject factor (0, 6, 12, 18, and 24 s) and sex as the between-subject factor (male and female), in conjunction with post-hoc comparisons using pairwise comparison Bonferroni test for within-subject comparisons and independent samples t-tests with Bonferroni corrections for between-subject comparisons. ANOVA effect sizes were reported as partial eta squared (η p 2 ), and Cohen's d (d) for post-hoc comparisons.
To examine whether analyses applied to traditional DD tasks was possible, hyperbolic equations were fitted to each rat's rejection volume (indifference points), based on that described by Mazur 85 , using GraphPad Prism (GraphPad Software Inc., San Diego, CA): where V indicates the rejected volume of the diminishing reinforcer when the rat left the current patch for the alternative patch in µl, A represents the amount of water from the alternative patch (150 µL), and D represents the delay to receiving the 150-µL reinforcer (COD of 0, 2, 4, 8, 16, or 24 s). The bias parameter, b, was calculated such that the product of b and A equaled each animal's indifference point at a 0-s delay 86 . The discount parameter (k) is an index for the rate of discounting or overall sensitivity to delayed reinforcers, such as the first reinforcer in an alternative patch. In DD tasks, larger values of k indicate steeper discount functions, stronger aversion to delayed reinforcers, more rapid devaluation of reinforcer value by delay, and thus greater impulsive choice. Here, k indicates higher relative levels of overharvesting and a preference for the smaller, sooner rewards available in the current patch over traveling to an alternative patch. The normalized area under the curve (AUC) of the discount function was calculated, which summarizes the influence of delay length on the choice to remain at a patch location. The AUC measure provides a simple measure of overharvesting/discounting that is not tied to a particular discount function 87 . Smaller AUC values indicate higher levels of overharvesting. The k, b, and AUC values were analyzed using independent samples t-tests, with sex as the between-subject factor.
Composite scores for all variables were calculated by the sum of the variable across all delays. These composite scores were used to calculate Pearson's correlation coefficients. To assess the difference in the strength of their association with DD and reward maximization, correlation coefficients were compared using Fisher r-to-z transformations 32,33 , which is recommended for comparing correlation coefficients from the same sample with one variable in common 88 . For all statistical tests, a p < 0.05 was used as the alpha criterion.

Data availability
The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.